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Abstract 

Ordinal  categorical  data  (OCD),  such  as  opinion  rankings,  are  common  in  many  areas  of 
application.  In  the  Air  Force,  Cooper-Harper  ratings  are  used  extensively  for  the  assessment  of 
Flying  Qualities.  OCD  is  not,  however,  a  ratio-scale  measurement  and  cannot  be  treated  as 
ordinary  numbers.  Notwithstanding  this,  the  ordinal  scores  are  often  regarded  as  ratio-scale  and 
analyzed  incorrectly  using  means  and  variances.  A  method  of  correct  analysis  of  OCD  leading  to 
statistically  valid  hypothesis  tests  and  based  on  a  method  of  probability  scoring  or  ‘Ridits,’  has 
found  wide  applicability  for  other  large-data-set  applications  such  as  Epidemiology.  This  paper 
explains  the  use  of  Ridits  and  examines  how  we  might  effect  a  Ridit  analysis  on  the  often  sparse 
data  sets  in  many  Flying  Qualities  applications1.  The  method  of  this  paper  is  to  fit  empirical  Beta 
distributions  to  observed  data,  and  then  to  use  a  randomization  approach  to  make  inferences  on 
the  difference  between  distributions  based  on  a  distance  metric. 

Key  words:  Borg  scale  rating;  Cooper-Harper;  flying  qualities;  Hellinger  distance;  human 
factors;  ordinal  categorical  data;  Ridit;  sparse  data;  statistical  defensibility. 


1  Introduction 

Ordinal  categorical  data11,  such  as  opinion  rankings  for  categories  of  products  or  other 
items,  are  common  in  many  fields  where  ratio-scale  measurements  are  unavailable111.  In  the  Air 
Force,  Cooper-Harper  ratings  (Cooper  &  Harper,  1969;  Harper  &  Cooper,  1986)  are  used 
extensively  for  the  assessment  of  Flying  Qualities  (Wilson  &  Riley,  1989,  1990).  An  often-made 
assumption  (Agresti,  1984,  pg.  2)  is  that  there  is  a  latent  but  unobserved  continuous  ratio  scalelv 
underlying  the  observed  ordinal  choices.  However,  such  ordinal  scores  are  often  treated  as  ratio- 
scale  measurements,  which  they  are  not,  and  analyzed  incorrectly  using  means  and  variances.  A 
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method  of  correct  analysis  leading  to  statistically  valid  hypothesis  tests  and  based  on  a  method  of 
probability  scoring  or  ‘Ridits,’  was  first  proposed  by  Bross  (1958).  Bross  named  the  Ridit  after 
‘with  Reference  to  an  Identified  Distribution.  ’  Ridit  analysis  was  later  formalized  by  Brackett 
and  Levine  (1977),  grounding  the  concept  of  a  Ridit  on  the  basis  of  intuitively  reasonable 
postulates. 

The  following  exposition  explains  the  use  of  Ridit  analysis  and  examines  how  we  might 
analyze  sparse  data,  common  in  many  Cooper-Harper  Flying  Qualities  applications,  using  Ridit 
and  related  analysis  techniques.  Ridit  analysis  is  simple  to  compute,  and  permits  statistics  such 
as  hypothesis  test  power,  necessary  to  determine  if  a  proposed  test  plan  is  statistically  defensible, 
to  be  estimated  as  well.  We  base  our  presentation  on  two  examples: 

1.  A  college  course  evaluation  by  students.  This  introduces  basic  concepts  and 
notation.v 

2.  Fatigue  scores  by  pilots  flying  several  sorties  at  increasing  levels  of  G-stress. 

In  many  applications  where  one  is  asked  to  compare  OCD  results  taken  under  different 
conditions — for  example,  different  flight  configurations — one  is  faced  with  the  problem  of  small 
sample  sizes.  The  standard  Ridit  analysis,  as  found  in  the  literature  (for  example,  Selvin,  1977, 
2004)  applies  correctly  to  large  sample  sizes  and  it  is  thus  necessary  to  discover  a  way  to  better 
treat  the  analysis  in  the  case  of  small  samples.  It  is  the  contribution  of  this  paper  to  suggest  an 
approach  that  does  not  depend  on  the  large-sample  Normal-distribution  approximation. 

We  introduce  standard  Ridit  analysis  in  Example  1  below,  and  then  apply  it  to  a  small 
sample  case  in  Example  2.  This  second  example  will  show  how  erroneous  confidence  intervals 
arise  using  the  standard  approach.  In  the  last  section  of  this  paper,  we  derive  an  alternative 
analysis  that  does  not  require  Normal-distribution  assumptions,  and  apply  it  to  the  data  of 
Example  2. 


2  Examples 

2.1  Example  1:  Evaluation  of  a  course  by  students 

Consider  the  following  data  analysis  (Croushore  &  Schmidt,  2010):  Students  were  asked 
to  enter  scores  to  the  question:  ‘This  course  fulfilled  my  expectation’  on  a  questionnaire  with 
their  answers  chosen  from  5  ordered  rankings  from  ‘Strongly  Disagree’  through  ’Strongly 
Agree.’  There  were  5  in  the  #1  ‘Comparison’  category,  8  in  the  #2  category,  etc.  These 
cumulated  scores  were  compared  to  the  previous  year’s  score  (the  ‘Reference’  column)  where 
there  were  3  in  the  #1  category,  6  in  the  #2  category,  etc. 
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Table  1  Student  preference  score  frequencies  in  each  of  two  years 


Preference 

# 

Comparison 

Reference 

Strongly  Disagree 

1 

5 

3 

Disagree 

2 

8 

6 

Neither  A.  nor  D. 

3 

6 

6 

Agree 

4 

2 

4 

Strongly  Agree 

5 

6 

8 

SUM 

27 

27 

The  important  aspects  of  this  type  of  data  are: 

1.  The  score  categories  are  ranked  in  some  ascending  (or  descending)  order 

2.  The  rankings  are  recognized  as  possibly  quite  different  in  ‘distance’  apart 

Let  V  and  X  denote  independent  discrete  random  variables  taking  values  in  the  set  {1,  . . 
K},  drawn  respectively  from  a  Reference  population  V  and  a  Comparison  population  X.  In  this 
example,  K=5.  Let  qk  —  P(V  —  k ),  and  pk  —  P(X  —  k ),  with  k  in  { 1 , ... ,  K),  and  denote  the 
column  vectors  (qlt ... ,  qKy  and  (p1; ... ,  pKy  by  q  and  p  respectively.  Vectors  such  as  q  and  p 
form  probability  distributions  over  {1,  ...,  K}. 

p  is  estimated  by  dividing  the  observed  frequency,  or  count,  in  the  comparison 
population’s  cell  entries  by  the  total  number  of  X  counts  (m)  for  that  population;  for  example, 

Pi  —  ^  —  0.185,  where  ‘A’  indicates  an  estimated  value.  Thus  p'  =  {p1, ... ,  pK).  In  what 
follows,  we  dispense  with  the  ‘A’  notation  and  refer  simply  to  p,  mentioning  the  difference  as  a 
need  arises,  q  is  defined  similarly,  with  the  total  number  of  V  counts  being  n.  In  the  current 
example,  both  m  and  n  equal  27. 

Definition: 


The  k-th  Riditfor  the  reference  population  V  is 


rk  = 


—  fork 


Ri 


1 

+  •••  +  qk-x  +-qk  fork 


(1) 


The  k-th  Riditfor  the  comparison  population  X  is 
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(2) 


= 


Pi 

—  for  k  =  1, 

i  1 

Pi  +  -  +  Pk-i  +^Pk  for  k  >  1 


Intuitively,  a  Ridit  is  akin  to  the  cumulated  probability  density  function  of  its  given 
population,  with  a  splitting  of  the  k-th  category  in  half.  All  the  quantities  necessary  for  a  Ridit 
analysis  are  easily  computed  in  a  spreadsheet  program,  as  we  see  in  Table  2,  which  shows  Table 
1  expanded  by  estimated  values  for  p,  q,  rp  and  rq. 


Table  2  Original  data  plus  Ridit  calculations  for  student  scores 


Preference 

# 

Comparison 

Reference 

r  (ridits) 

P 

q 

rp 

rq 

Strongly  Disagree 

1 

5 

3 

0.056 

0.185 

0.111 

0.010 

0.006 

Disagree 

2 

8 

6 

0.222 

0.296 

0.222 

0.066 

0.049 

Neither  A.  nor  D. 

3 

6 

6 

0.444 

0.222 

0.222 

0.099 

0.099 

Agree 

4 

2 

4 

0.630 

0.074 

0.148 

0.047 

0.093 

Strongly  Agree 

5 

6 

8 

0.852 

0.222 

0.296 

0.189 

0.252 

SUM 

27 

27 

0.411 

0.500 

The  sum  of  column  rp,  which  is  the  inner  product  of  vectors  r  and  p  (that  is,  r’p  in 
vector  notation),  equals  0.41 1.  This  sum  is  denoted  R(plq)  and  is  called  ‘the  mean  Ridit  of  the 
reference  population  with  respect  to  the  comparison  population."11  That  is 

R(P\0)  =  ELiOcPfc  =  0.411 


(3) 

This  quantity  is  an  estimate  of  the  expectation  of  the  Ridits  of  V  (that  is,  the  r)  under  the 
distribution  (that  is,  the  p)  of  the  Comparison  population  X,  or  Ex(r). 

For  any  k  <  K,  rk  is  the  cumulated  sum  of  the  known  or  observed  probabilities  of  the 
Reference  population  V  (i.e.  the  qj<k )  up  to  and  including  y;  it  is  thus,  intuitively,  the 
probability  that  a  response  from  the  Reference  population  V  is  less  than  the  ‘middle’  of  the  k-th 
category. vu  It  can  be  shownvul  that 

R(p\q)  =  P(V<X)  +  i P(V  =  X ) 


(4) 


Consider  an  interpretation  of  the  mean  Ridit  R(plq):  By  (4)  it  is  clear  that  R(plq)  is  the 
probability  that  the  reference  distribution  V  lies  to  the  left  of  the  comparison  distribution  X,  with 
the  ‘break-even’  situation  being  R(plq)  =  Vi .  If  R(plq)  >  ¥2,  (i.e.  if  the  probability  R(plq)  is 
higher  than  V2  ),  then  {qk},  the  probability  mass  of  V  will  lie  mostly  to  the  left  of  (pfe),  the 
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probability  mass  of  X,  as  equation  (1)  implies.  In  Student  Evaluation  terms,  this  implies  V  is 
closer  to  the  ‘Strongly  Disagree’  end  of  the  scale  than  X.  The  reverse  is  true  of  R(plq)  <  Vi,  in 
which  case  X  will  be  to  the  ‘left’  of  V  or,  on  average,  closer  to  the  ‘Strongly  Disagree’  end  of 
the  scale  than  V. 

R  (p  |  q)  is  thus  a  proxy  for  the  probability  that,  on  average,  an  individual  drawn  at 
random  from  the  Reference  population  V  is  ‘to  the  left’  of  a  random  individual  from  the 
Comparison  population;  in  other  words:  R(p\q)  «  P(V  <  X).  The  higher  the  value  of  R(p\q), 
the  more  likely  a  V-individual  will  be  ‘to  the  left’  (that  is,  in  this  example,  to  score  ‘#l=Strongly 
Disagree’)  compared  to  an  X-individual,  and  vice-versa. 

Since  R(q\q)  =  ELi  rkqk  =  r'q  =  \Y£=i  Qk  +  E;*;  RiQj  =  \  Oh  +  +  -  +  <k)2,it 

follows  that  R(q\q)=  0.5  ,  and  this  holds  for  any  q.  That  is,  the  mean  Ridit  of  the  Reference 
population  V  with  respect  to  itself  is  the  inner  product  r'q,  and  always  equals  — . 

A  one-sided  null  hypothesis  of  interest  in  comparing  the  mean  Ridits  of  the  two 
populations  to  test  if  {V  is  ‘to  the  left’  of  X}  is 

H0:R(p\q)  >  i 


(5) 


If  we  reject  this  hypothesis,  by  observing  that  R(p\q)  is  significantly  less  than  Vi,  we 
may  conclude  that  the  probability  that  {V  is  to  the  left  of  X}  is  low,  and  therefore  it  is  rather  the 
X  scores  that  are  ‘to  the  left’  of  V  scores  on  average.  A  test  of  the  null  hypothesis  can  be 
executed  by  forming  the  statistic  Z,  where 

=  1 


Jvar(r) 


(6) 

and  where  f  —  R(p\q)  ,  the  estimate  of  r  given  by  entering  the  observed  values  of  p  and 
q  into  R(plq) .  Z  has  an  approximately  Normal  (0,  1)  distribution  for  large  enough  m  and  n.lx 

The  variance  of  f  is  sometimesx  taken  as  1/(1 2m);  this  assumes  that  q  is  known,  which  it 
seldom  is  and  this  variance  estimate  is  better  replaced  by  the  more  conservative  estimate 

1  1 

Varir )  =  — —  +  — - 
12m  12n 


(7) 


which  is  a  variance  formula  given  by  Selvin  (1977).  Doing  the  Z-test  for  the  Student  Scores 
example  with  these  formulas  gives 


5 


0.411-  0.5 


Z  = 


+ 


12x27  1  12x27 


(8) 


So  Z  =  -  0.089/0.0785  =  -1.132.  The  critical  Normal  (0,  1)  left-tail  Z  value  for  a  one¬ 
sided  hypothesis  test  at  the  95%  level,  Z0  05  =  —1.645,  so  the  test  result  is:  ‘No  significant 
difference  detected’  between  this  year’s  and  last  year’s  student  scores. X1  That  is,  there  is  no 
evidence  to  suspect  that  X  is  ‘to  the  left’  of  V. 

The  example  above  served  to  introduce  Ridit  definitions  and  the  usual  mean-Ridit  test  for 
the  factor  ‘Student  Year’  being  presented  at  two  levels,  namely:  Current  year  and  Previous  year. 
We  now  we  examine  an  example  where  the  input  factor  is  given  over  several  levels.  In  addition, 
we  examine  how  we  might  construct  confidence  intervals  for  the  Ridit  means,  and  how  the  fact 
of  small  data  samples  may  affect  the  confidence  intervals  for  these  means. 


2.2  Example  2:  Analysis  of  Borg-scale  Fatigue  levels  over  several  stages  of  G 


Five  pilots  were  assigned  to  fly  several  repeated  sorties  at  increasing  G  (gravitational 
stress)  levels  and  their  ‘Fatigue’  was  measured  by  responses  scored  by  an  adjusted  Borg-scale 
measure.  The  standard  Borg  scale  measures  physiological  exertion  expressed  on  a  range  of  6  to 
20,  with  6  being  ‘no  exertion  at  all.’  The  adjusted  scale  used  in  the  present  example  went  from  0 
through  10,  with  0  being  ‘no  exertion  at  all.’  This  adjusted  scale  is  seen  to  be  very  similar  to  the 
Cooper-Harper  scale.  These  scores  are  plotted  in  Figure  1.  Note  that  G1.5  was  set  at  slightly 
above  stationary,  ground-level  G.  G85  and  G95  refer  to  repeated  maneuvers  at  G8  and  G9 
respectively.  No  observed  score  exceeded  level  5. 
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Figure  1  Adjusted  Borg  scores  for  ‘Fatigue’  vs.  increasing  G  levels 


The  counts  of  recorded  Borg  scores  for  each  G  level  were  entered  into  Table  3.  For  example,  in 
column  G1 .5,  there  were  8  scores  of  ‘O’,  one  score  of  ‘  1’,  etc.  This  layout  enables  the  Ridit 
calculations  to  be  easily  done. 


Table  3  Adjusted  Borg  scores  given  by  pilots  flying  at  increasing  G  levels 


Score 

G1.5 

G5 

G6 

G7 

G8 

G85 

G95 

Reference 

0 

8 

6 

0 

2 

0 

0 

0 

8 

1 

1 

3 

1 

5 

3 

0 

0 

1 

2 

1 

1 

2 

2 

6 

3 

1 

1 

3 

0 

0 

3 

1 

1 

4 

6 

0 

4 

0 

0 

2 

0 

0 

3 

1 

0 

5 

0 

0 

2 

0 

0 

0 

2 

0 

SUM 

10 

10 

10 

10 

10 

10 

10 

10 

Taking  G1.5  as  a  fixed  baseline,  and  comparing  the  fatigue  scores  for  increasing  G  levels, 
the  95%  Bonferroni-adjusted  t-value  (lower-tail)  percentile  for  k=6  comparisons  against  G1.5  as 
the  reference  distribution  (d.f.  =18,  and  using  l-a/2k)  is  0.004;  thus  we  see  that  by  these  t-tests 
that  G6,  G8,  G85  and  G95  are  significantly  different  to  G1.5.  The  complete  results  are  given  in 
Table  4. 
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Table  4  Ridits  for  the  Borg  Fatigue  scores  vs.  increasing  G  levels,  and  the  probability  that 
each  value  is  greater  than  the  G1.5  reference  level  by  Bonferroni-adjusted  t-value 


G1.5 

G5 

G6 

G7 

G8 

G85 

G95 

ridit  value 

0.500 

0.590 

0.975 

0.795 

0.925 

0.985 

0.995 

probability 

0.500 

0.252 

0.001 

0.019 

0.002 

0.001 

0.000 

95%  confidence  intervals  at  each  G-value  might  be  constructed  using  these  Ridit  values 
and  the  adjusted  t-value,  as  shown  in  Figure  2. 


G 

Figure  2  Plot  of  Ridits  and  their  95%  Bonferroni-adjusted  confidence  intervals  for 

the  Borg  Fatigue  scores  vs.  G 

The  figure  bears  out  the  conclusions  of  Table  4  except  for  an  overlap  of  the  0.5  hne  at 
G7.  As  will  also  be  noted,  some  of  the  confidence  intervals  overlap  the  endpoints  of  the  (0,  1) 
interval  to  which  Ridits  are  constrained,  and  the  graph  of  this  analysis  should  be  taken  as  an 
approximate  indication  of  significant  fatigue-level  differences. 


3.1  A  Distance-Based  Approach 
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So  far  we  have  shown  that  standard  Ridit  analysis  applies  quite  well  to  inferences  on  the 
differences  of  means  even  in  the  case  of  small  samples.  However,  the  problem  of  incorrect 
confidence  intervals  is  a  problem  that  needs  be  addressed.  One  solution  to  this  problem  is  to  take 
a  transform  of  the  Ridits  that  might  better  approximate  a  Normal  distribution  for  the  small- 
sample  case.  Such  an  approach  is  discussed  in  Hurwitz  (2015);  the  transform  that  was  taken  was 
the  logistic  transform,  and  an  assumption  was  made  that  this  gave  a  Normally-distributed 
situation  that  could  be  used  in  making  inferences  about  Ridit  means  and  their  confidence 
intervals.  This  solved  the  problem  of  inappropriate  confidence  bounds  as  evinced  in  Figure  2 
above.  However,  it  is  not  always  certain  that  the  logit  transform — or  any  transform — will  give  an 
adequate  approximation  to  Normality.  In  the  following  discussion  we  will  take  a  different 
approach  to  the  problem. 

Consider  the  problem  of  comparing  the  ‘distance’  between  any  two  discrete  probability 
distributions  {p;  }  and  { qL  }  defined  over  a  common  domain.  One  such  measure  is  the  discrete- 
probability-distribution  version  of  the  (squared)  ‘Hellinger  Distance’  (Yang  &  Le  Cam,  2000) 

H2(p,q )  =  1  -  BC(p,  q) 

(9) 

where  BC(p,  q)  is  the  ‘Bhattacharyya  Coefficient’  (Bhattacharyya,  1943) 

BC(p,  q)  =  Sail  i  yfviqi  • 

(10) 

The  maximum  Hellinger  distance  1  is  achieved  when  assigns  probability  zero  to  every 
set  to  which  qt  assigns  a  positive  probability,  and  vice  versa.  In  Table  5  we  show  the 
consequence  of  using  H2  to  gauge  the  distance  between  discrete  distributions  of  fatigue  ratings. 

Table  5  Three  Hypothetical  Discrete  OCD  Fatigue  Distributions 


Score 

i 

G1 

G2 

G3 

0 

i 

8 

0 

0 

1 

2 

0 

0 

8 

2 

3 

2 

0 

0 

3 

4 

0 

1 

2 

4 

5 

0 

5 

0 

5 

6 

0 

4 

0 

SUM 

10 

10 

10 

10 

It  is  clear  what  will  happen  to  H 2  when  the  ratings  are  turned  into  their  corresponding 
probability  values  -  column  G1  vs.  G2  will  have  H2  =  0,  as  we’d  expect  (as  the  distributions  are 
‘far  apart,’)  but  columns  G1  vs.  G3  will  also  have  H 2  =  0,  as  we  do  not  expect  between  two 
distributions  that  are  ‘quite  close  together.’ 
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A  solution  here  is  to  fit  continuous  distributions  over  the  discrete  ones  we  observe,  and 
then  to  apply  the  continuous-distribution  version  of  H2  to  these  instead.  First,  however,  we  need 
to  decide  on  how  to  construct  the  ‘common  domain’  of  Ridits  (Tj)  on  which  the  definition  of  H2 
will  depend.  In  our  previous  ‘standard’  Ridit  treatment,  as  can  be  seen  in  Table  3  above,  we  took 
one  column — namely  G1.5 — as  the  ‘reference  distribution’  and  compared  the  other  columns  to 
it.  This  has  the  advantage  of  carrying  the  idea  of  independent  means  through  to  our  inferences. 
However,  the  ‘domain’  that  we  implicitly  use  is  then  restricted  to  those  three  rows  of  Table  3 
where  the  corresponding  to  G  1.5  are  non-zero,  namely  rows  1,  2,  and  3.  The  remaining  three 
rows  then  have  Ridits  equal  to  1.0,  as  those  are  the  cumulated  value  of  the  probabilities  for  the 
G1.5  observations.  A  more  satisfactory  construct  for  a  domain  would  be  to  have  Ridit  values 
distributed  more  evenly  across  the  [0,  1]  range,  and  this  can  be  achieved  by  making  the  marginal 
row  sums  the  new  reference  distribution,  as  shown  in  Table  6. 

Table  6  Ridit  Reference  Distribution  based  on  Row  Sums 


Score 

G1.5 

G5 

G6 

G7 

G8 

G85 

G95 

Ref=Row  Sum 

0 

8 

6 

0 

2 

0 

0 

0 

16 

1 

1 

3 

1 

5 

3 

0 

0 

13 

2 

1 

1 

2 

2 

6 

3 

1 

16 

3 

0 

0 

3 

1 

1 

4 

6 

15 

4 

0 

0 

2 

0 

0 

3 

1 

6 

5 

0 

0 

2 

0 

0 

0 

2 

4 

SUM 

10 

10 

10 

10 

10 

10 

10 

70 

Table  7  shows  the  corresponding  probability  computations  for  the  seven  columns  of 
observed  Fatigue  scores.  These  will  be  the  same  as  those  computed  for  Table  3.  The  difference 
here  is  that  the  Ridit  column  ‘r’  is  now  based  on  the  marginal  row  sums  rather  than  just  on  the 
G1.5  column.  The  domain  of  our  probabilities  is  now  more  evenly  spread  across  [0,  1].  The 
mean  Ridits  are  also  shown;  formula  is  the  usual  x  —  rp  =  Y,i  riPi  ,  one  mean  for  each  column. 

Table  7  Prob.  Distributions  of  Fatigue  Scores,  with  Ridits  (r)  based  on  Row  Sums. 

Probability  means  over  ‘r’  are  shown  on  last  line. 


Score 

Pi 

P2 

P3 

P4 

P5 

P6 

P? 

r 

0 

0.800 

0.600 

0.000 

0.200 

0.000 

0.000 

0.000 

0.114 

1 

0.100 

0.300 

0.100 

0.500 

0.300 

0.000 

0.000 

0.321 

2 

0.100 

0.100 

0.200 

0.200 

0.600 

0.300 

0.100 

0.529 

3 

0.000 

0.000 

0.300 

0.100 

0.100 

0.400 

0.600 

0.750 

4 

0.000 

0.000 

0.200 

0.000 

0.000 

0.300 

0.100 

0.900 

5 

0.000 

0.000 

0.200 

0.000 

0.000 

0.000 

0.200 

0.971 

MEANS  =Z(rp) 

0.176 

0.218 

0.737 

0.364 

0.489 

0.729 

0.787 

Figure  3  illustrates  the  shapes  of  the  seven  observed  probability  distributions,  given  as 
vertical  lines  across  the  domain  of  the  r’s,  with  a  line  connecting  the  tops  of  each  vertical  line. 
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Figure  3  Observed  Probability  Distributions  of  Fatigue  Scores 


The  next  step  is  to  fit  appropriate  continuous  distributions  to  the  observed  probabilities.  A 
flexible  choice  for  this  is  the  Beta(a,  (3)  family  of  distributions  which  have  support  (i.e.  domain) 
over  (0,  1)  and  shapes  similar  to  those  of  the  observed  (discrete)  distributions.  We  obtain 
estimates  of  the  required  seven  (a,  p)  pairs  via  the  method  of  moments  formulas  for  the  Beta 
distribution: 


oc  =  x  (  x<yl_  ^  —  1 ),  if  v  <  x(l  —  x)  where  the  x's  are  given  by  the  Erp  ’s  in  Table  7 

(ID 

(3  —  (1  —  x)  (  ^  —  1 ),  if  v  <  x(l  —  x),  and  v  is  an  estimated  variance. 


(12) 
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We  already  have  the  estimates  for  the  x’s.  The  estimates  for  the  v's  are  derived  by 
following  the  formulas  for  a  weighted  variance  (our  observations  are  given  in  weighted  form,  the 
weights  being  the  probabilities  p;).  The  weighted  variance  formula,  with  weights  wt,  £  Wj  =  1, 

v  =  Y.WitXi  -x)2 

(13) 


This  translates,  in  our  case,  to 
v  =  T.Piin-x)2. 


(14) 


x  is  the  ‘rp’  mean  for  that  column  as  given  in  Table  7.' 

One  variance  is  computed  for  each  column  of  p[s.  The  results  are  shown  in  Table  8.  The 
first  two  rows  are  for  the  check  that  v  <  x(l  —  x);  all  instances  pass  this  check. 

Table  8  Variance,  check,  and  Beta  Distribution  parameters  by  Method  of  Moments 


1 

2 

3 

4 

5 

6 

7 

G1.5 

G5 

G6 

G7 

G8 

G85 

G95 

variance 

0.018 

0.019 

0.042 

0.034 

0.016 

0.021 

0.016 

mean(l-mean) 

0.145 

0.170 

0.194 

0.232 

0.250 

0.198 

0.168 

alpha 

1.281 

1.705 

2.638 

2.139 

7.059 

6.132 

7.678 

beta 

5.979 

6.120 

0.941 

3.734 

7.389 

2.285 

2.076 

Now  we  are  in  a  position  to  construct  fitted  Beta  distributions — one  for  each  column  of 
the  Fatigue  scores — against  the  observed  probability  histogram.  Figure  4  shows  cumulated 
distribution  fits  for  all  seven  distributions. 
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As  can  be  seen,  the  fitted  continuous  Beta  distributions  are  reasonable  approximations  to 
the  observed  discrete  probability  distributions.  This  will  form  the  basis  of  our  inferences. 

We  now  compute  the  squared  Hellinger  distance  between  the  continuous  G1.5  and  G8 
distributions,  and  use  it  as  our  metric  for  ‘distance  apart.’  The  formula  for  H2  given  for 
continuous  distributions  is 


H2 


OC1+OC2  /?!  +  /?2a 
2  ’  2  ' 

V B  ((2j,  /?1)6((22,  P2) 


(15) 


where  ‘B’  is  the  Beta  function. 

Computing  all  the  H2  distances  gives  a  symmetric  matrix  with  0’s  on  the  diagonal;  the 
0’s  show  the  distance  of  a  distribution  from  itself,  and  the  matrix  is  symmetric  since  the  distance 
is  symmetric  either  way.  This  is  shown  on  Table  9. 
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Table  9  Squared  Hellinger  Distances  between  all  seven  Fatigue  distributions 


1.1] 

1.2] 

1,3] 

L4] 

[,5] 

[,6] 

1.7] 

[1.] 

0.000 

0.016 

0.648 

0.164 

0.497 

0.773 

0.855 

[2,] 

0.016 

0.000 

0.593 

0.098 

0.402 

0.722 

0.819 

[3,] 

0.648 

0.593 

0.000 

0.353 

0.321 

0.056 

0.063 

[4,] 

0.164 

0.098 

0.353 

0.000 

0.128 

0.427 

0.558 

[5,] 

0.497 

0.402 

0.321 

0.128 

0.000 

0.325 

0.484 

[6,] 

0.773 

0.722 

0.056 

0.427 

0.325 

0.000 

0.024 

[7,] 

0.855 

0.819 

0.063 

0.558 

0.484 

0.024 

0.000 

The  distance  between  distribution  ‘1’  (G1.5),  and  distribution  ‘2’  (G5)  is  shown  as  0.16. 
Could  this  distance  have  happened  by  chance?  If  one  were  dealing  with  independent  Normal 
distributions,  one  could  use  a  t-test  to  answer  this  question.  In  our  present  case,  we  can  answer 
the  question  using  a  randomization  trial  as  follows: 

Set  the  null  hypothesis  as  Ho:  G1.5  =  G5.  The  alternative  is  ‘G1.5  f  G5’.  Assume  the 
null  to  be  true.  Draw  a  random  sample  size  10  (recall,  10  was  our  original  Fatigue  sample  size 
for  G1.5).  Draw  a  second  random  sample  from  G1.5  as  well.  Compute  H2  for  this  sample  pair. 
Repeat  (say)  10,000  times  and  collect  the  10,000  values  of  H2.  Now  ask:  ‘What  proportion  of 
the  10,000  distances  are  greater  than  or  equal  to  0.16?’  This  can  be  computed  as  a  simple  ratio 
from  the  H2  data,  and  gives  the  probability  ‘p’  that  the  null  hypothesis  is  true.  Doing  this,  we 
obtain:  p  =  0.789.  This  is  a  high  probability,  so  we  conclude  that  Ho,  the  null  hypothesis,  is  true 
G1.5  and  G5  are  too  close  to  tell  apart,  so  we  have  no  evidence  that  they  are  different 
distributions.  Taking  the  first  row  of  H2  distances  and  doing  the  same  using  two  random 
samples,  both  from  G1.5,  gives  the  probabilities  of  Table  10. 

Table  10  H 2  distances,  and  the  probability  that  each  distribution  =  G1.5 


G1.5 

G5 

G6 

G7 

G8 

G85 

G95 

H2 

0.000 

0.016 

0.648 

0.164 

0.497 

0.773 

0.855 

Prob. 

1.000 

0.789 

0.000 

0.067 

0.000 

0.000 

0.000 

The  results  in  Table  10  are  in  general  agreement  with  the  standard  Ridit  results  given  in 
Table  4.  G1.5  is  can  be  taken  as  equivalent  to  G5  and  G7,  but  the  other  distributions  are  different 
toG1.5. 


3.2  Confidence  intervals  for  the  differences  in  means 


We  have,  so  far  in  Section  3,  examined  the  probabilities  that  two  observed  distributions 
of  OCD  data  are  the  same  or  different  and  we  have  done  so  using  the  squared  Hellinger  distance. 
We  could  continue  and  develop  confidence  intervals  around  the  squared  Hellinger  distances,  but 
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the  results  would  not  be  easily  interpretable  in  Engineering  terms.  We  will,  instead,  take  a  more 
intuitive  approach  and  compute  confidence  intervals  around  the  Ridit  ( rp )  means,  and  we  do  this 
by  using  the  fitted  Beta  distributions.  Our  (randomization -based)  method  is: 

1.  Draw  two  samples,  each  size  n=10,  at  random  from  each  of  two  Beta  distributions 

2.  Compute  the  difference  between  the  means 

3.  Do  this  10,000  times 

4.  Compute  a  100(l-alpha/2KK)  Cl  based  on  proportions  where  KK=number  of 
comparisons  we  will  make  (and  gives  the  Bonferroni  correction). 

This  method,  comparing  G1.5  mean  to  all  other  means,  gives  the  Confidence  intervals  in 
Table  11. 


Table  11  Bonferroni-adjusted  Cl’s  on  mean  differences:  G1.5  vs  other  means 


G5 

G6 

G7 

G8 

G85 

G95 

upper 

99.60% 

0.200 

0.745 

0.376 

0.457 

0.701 

0.753 

mean 

0.042 

0.561 

0.188 

0.311 

0.551 

0.612 

lower 

0.42% 

-0.117 

0.343 

0.001 

0.147 

0.380 

0.451 

Figure  5  give  a  graphical  example  of  what  we  have  done  for  the  case  of  the  difference 
between  the  means  for  G1.5  and  G95. 
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Fitted  Beta  Examples:  G15  and  G95  distributions 


Mean  Differences,  Mean,  and  Cl  Limits 


Figure  5  Fitted  Beta  distributions  for  G1.5  and  G95;  Mean,  Differences  &  C.I.’s 


Table  10  shows  that,  with  reference  to  G1.5,  the  distributions  of  G5  and  G7  are  either  no 
different  or  close  to  no  different.  Table  11  bears  this  out,  with  the  Ridit  confidence  intervals  for 
G5  and  G7  either  including  zero  or  close  to  including  zero.  For  the  other  Ridit  means  difference 
to  G1.5,  Table  11  gives,  for  example,  a  Cl  for  the  G6  difference  of  [0.343,  0.745]  around  a  ridit 
mean  of  0.561.  The  distribution  of  OCD  scores  (and  fitted  Beta  distribution)  for  G6  lies  to  the 
right  of  that  for  G1.5,  and  these  results  indicate  that  the  mean  Ridit  value  for  G6  is  above  that  for 
G1.5  by,  on  average,  0.561.  This  says  that  we  can  state,  at  a  95%  level  of  confidence,  that  the 
probability  distribution  based  on  the  G6  OCD  results  has  an  average  that  is  0.561  higher  than 
that  for  the  OCD  results  taken  at  G1.5:  G6  gives  significantly  greater  Fatigue  scores  than  G1.5. 
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Conclusion 


We  have  developed  a  new  method  for  comparing  the  results  of  OCD  data  distributions 
that  is  not  based  on  the  standard  large-sample  Ridit  analysis  methods.  We  have  used  a  distance- 
based  metric  and  randomization  tests  to  give  inferences  on  distributions,  and  for  confidence 
intervals  on  mean  differences.  The  method  used  in  our  construction  induces  some  dependence 
between  the  means,  and  this  item  needs  to  be  further  investigated.  However,  the  results  presented 
here  are  in  line  with  independent-mean  results  derived  earlier  and  our  new  method,  we  believe, 
gives  a  path  to  a  better  analysis  of  small  sample  OCD  especially  as  no  Normal-distribution 
assumptions  need  be  made,  and  the  confidence  intervals  so  derived  do  not  violate  the  bounds  of 
the  probability  limits. 


Appendix  A 
Al: 

R(p\q)  =P(V<X)+  1-P(V  =  X). 

Proof  of  Al : 

If  drawings  from  V  and  X  are  independent,  and  for  any  k  <  K,  then  the  proof  follows  from: 
rkPk  =  Oh  +  -  +  <?*-!  +\<lk)Vk  .that  is 

rkPk  =  P  ({ v=l }  n  {X  =  k})  +  ...  +  P({k  =  k-l}n{x  =  k})  +  1-  P({V  =  k]  n  {X  =  k}). 
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Endnotes 

I  All  flying  qualities  data  in  this  paper  is  synthetic,  and  has  been  simulated  to  illustrate  Ridit 
analysis 

II  A  general  reference  is  Agresti,  1996 

III  Lor  a  discussion  on  measurement  taxonomy,  see 
http://en.wikipedia.org/wiki/Level  of  measurement 

IV  Measurements  which  are  comparable  to  each  other  in  terms  of  size  or  distance  apart. 
v  The  terminology  and  notation  given  in  Beder  &  Heim  (1990)  is  followed  closely 

V1  Beder  &  Heim,  1990,  reverse  this  wording  and  give:  'the  mean  Ridit  of  the  comparison 
population  with  respect  to  the  reference  population  our  wording,  however,  is  more  in  line  with 
the  actual  construct  of  R  (p  |  q) . 

vu  The  ‘middle’  or  the  ‘median’  of  a  category  is  not  an  exact  term  as  a  category  is  ordinal,  not 
ratio- scale 

vm  See  Appendix  A,  result  A1 
1X  Beder  &  Heim,  1990,  formula  (17). 
x  Bross (1958) 

X1  Note  that  in  Croushore  &  Schmidt  (2010),  the  variance  was  taken  as  l/(12m),  so  estimated 
standard  error  in  that  paper  is  0.056 
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Overview 


■ 


412W 


•  Ridit  method/example  -  Course  Rating  by  Students 

-  Basic  Method  &  Notation 

•  Ridit  example  -  Borg  Scores  for  Fatigue  Levels 

-  Means  &  Confidence  Intervals  using  standard  method 

-  Distribution  comparisons  using  a  distance-based 
method 

-  Confidence  Intervals  using  randomization 
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Ridit  Analysis 


412TW 


•  Consider  a  simple  example:  27  students  are 
asked  to  answer  ‘Course  was  good?’  from  #1 
(Strongly  Disagree)  to  #5  (Strongly  Agree) 


This  year’s  proportions 


T 


Last  year’s  proportions 


T 


Preference 

# 

Comparison 

Reference 

ridits  (r) 

P 

q 

rp 

rq 

Bad 

Strongly  Disagree 

1 

5 

3 

0.056 

0.185 

0.111 

0.010 

0.006 

* 

1 

Disagree 

2 

8 

6 

0.222 

0.296 

0.222 

0.066 

0.049 

1 

1 

■ 

Neither  A.  norD. 

3 

5 

6 

0.444 

0.222 

0.222 

0.099 

0.099 

* 

Good 

Agree 

4 

2 

4 

0.630 

0.074 

0.148 

0.047 

0.093 

Strongly  Agree 

5 

6 

8 

0.852 

0.222 

0.296 

0.189 

0.252 

SUM 

27 

27 

0.411 

0.500 
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Ridit  Analysis -  (continued) 


Preference 

ft 

Comparison 

Reference 

rid  its  (r) 

P 

q 

rp 

rq 

Strongly  Disagree 

1 

5 

3 

0.056 

0.185 

0.111 

0.010 

0.006 

Disagree 

2 

8 

6 

0.222 

0.296 

0.222 

0.066 

0.049 

Neither  A.  nor  D. 

3 

6 

6 

0.444 

0.222 

0.222 

0.099 

0.099 

Agree 

4 

2 

4 

0.630 

0.074 

0.148 

0.047 

0.093 

Strongly  Agree 

5 

6 

8 

0.852 

0.222 

0.296 

0.189 

0.252 

SUM 

27 

21 

0.411 

0.500 

•  Proportions  p  and  q  (i.e.  estimated  probabilities) 
are  computed  from  the  data.  E.g.  0.185  =  5/21 ,  etc. 

•  A  population  (Last  year’s)  is  set  as  the  ‘Reference’ 

•  The  k-th  ridit  of  the  Ref.  population  is  defined  as: 


Y  for  k  = 

•"  +  ctk-i  + 


1, 

for  k  >  1 
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Ridit  Analysis 


0.300 

0.250 

0.200 

0.150 

0.100 

0.050 

0.000 


To  * 

the 

left 

To 
the 
rt.  I 


0.07  0.28  0.52  0.69  0.86  0.99 


Preference 

it 

Comparison 

Reference 

ridits  (r) 

P 

q 

rp 

rq 

Strongly  Disagree 

1 

5 

3 

0.056 

0.185 

0.111 

0.010 

0.006 

Disagree 

2 

8 

6 

0.222 

0.296 

0.222 

0.066 

0.049 

Neither  A.  nor  D. 

3 

6 

6 

0.444 

0.222 

0.222 

0.099 

0.099 

Agree 

4 

2 

4 

0.630 

0.074 

0.148 

0.047 

0.093 

Strongly  Agree 

5 

6 

8 

0.852 

0.222 

0.296 

0.189 

0.252 

SUM 

27 

27 

0.411 

0.500 

A  A 


•  Form  columns  rp  and  rq,  and  sum  (X)  each  one 


•  Z  rP  =  0-41 1  is  the  probability  that  the  Reference  pop.  will  be  ‘to  the  left’  of 
the  Comparison  pop. 


-  If  the  p’s  are  ‘bunched’  to  the  right  versus  the  q’s,  then  £  rq  <  Z  rP 

-  that  is,  high  £  rp  =>  Reference  pop.  (q’s)  is  bunched  ‘to  the  left’  of  p’s 

-  that  is,  high  £  rp  =>  Reference  pop.  (last  year)  was  worse  than  this  year 
•  Our  HYPOTHESIS  is  that  £  rp  >  0.5  What  does  this  mean? 

-  If  true,  then  last  year’s  (Reference)  scores  are  worse  than  this  year’s 

-  However,  it’s  obvious  that  X  rP  =  0-41 1  ^  0.5  -  So  was  last  year  better? 

-  Can  only  say  this  if  experimental  error  =  0  ->  We  need  a  statistical  test! 
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Test 

412TW 

•  ‘Experimental  error’  means  that,  if  the  underlying  situation  stays  the 
same,  but  we  draw  a  new  sample,  the  numbers  (p’s  and  q’s)  we  see 
will  be  somewhat  different.  So  conclusions  might  change 


WRidit  Analysis  -  Hypothesis 


To  test  Ho:  £  rp  ^  X  rq  =  0.5  ,  form  t  =  (£  rp  -  0.5)  / 


12  n 


1 

12  mn. 


m  =  n  =  27.  So  t  =  (0.41 1-  0.5)/sqrt(0.0063)=  -1 .12,  with  d.f.=  m+n-2  =  52 

Left-tail,  critical  t  (at  95%  confidence,  d.f.=52)  =  -1 .675,  so  do  not  reject  Ho 

->  We  cannot  say  that  this  year’s  scores  are  any  better  than  last  year’s 


NOTE:  If  we  had  another  distribution  (e.g.  p-scores  from  another  school, 


p°),  we  could  test  Ho:  £  rp  rp°  using  q  as  ref.,  and  var  = 


+ 


,12m  12n_ 
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Borg-scale  Fatigue  Levels  vs.  G  ||J 

•  The  Borg  Scale  measures  physiological  exertion  and  is 
given  over  a  range  of  6  through  20,  with  6  being  ‘No 
exertion  at  all’ 

•  Five  pilots  flew  several  repeat  sorties  at  different  G  levels 
and  recorded  ‘Fatigue’  on  a  modified  Borg  scale  of  0 
through  10  -  (very  similar  to  a  Cooper-Harper  scale) 

•  The  G  levels  were:  G1,  G2,  ...  ,G6,  G7  with  G1  slightly 
above  ground-level  zero  G  as  a  ‘baseline,’  and  G6  and 
G7  being  repeated  maneuvers  at  8G  and  9G  respectively 

•  Is  Fatigue  at  higher  G  levels  significantly  greater  than 
Fatigue  at  G1  ?  No  observed  Fatigue  rating  was  >  5 


Fatigue 


Adjusted  Borg  scores  for 
‘Fatigue’  vs.  increasing  G  levels 


G1 


G2 


G3 


G4 


G5 


G6 


G7 
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jitter(as.numeric(G)) 
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Adjusted  Borg  scores  given  by  pilots 
flying  at  increasing  G  levels 


412TW 


Score 

G1 

G2 

G3 

G4 

G5 

G6 

G7 

Reference 

0 

8 

6 

0 

2 

0 

0 

0 

8 

1 

1 

3 

1 

5 

3 

0 

0 

1 

2 

1 

1 

2 

2 

6 

3 

1 

1 

3 

0 

0 

3 

1 

1 

4 

6 

0 

4 

0 

0 

2 

0 

0 

3 

1 

0 

5 

0 

0 

2 

0 

0 

0 

2 

0 

SUM 

10 

10 

10 

10 

10 

10 

10 

10 
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Ridit  Analysis  of  the  Borg  scores 
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G1 

G2 

G3 

G4 

G5 

G& 

G7 

ridit  value 

0.500 

0.590 

0.975 

0.795 

0.925 

0.9S5 

0.995 

probabilisty 

0.500 

0.252 

0.001 

0.019 

0.002 

0.001 

0.000 

•  Ridits  for  the  Borg  Fatigue  scores  vs. 
increasing  G  levels,  and  the  probability  that 
each  value  is  less  than  the  G1  reference  level 
(by  Bonferroni-adjusted  t-value) 
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V  Ridit  Plot  +  Confidence  Intervals  Jfefc 
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•  Plot  of  mean  ridits  and  their  95%  Bonferroni-adjusted 
confidence  intervals  for  the  Borg  Fatigue  scores  vs.  G 

•  Problem:  Several  Cl’s  show  overlap  of  (0, 1). 

-  This  implies  that  our  distribution  theory  is  only  approximate 


li 


W  COMPARING  TWO  DISTRIBUTIONS  jW 

^  412TW 

•  In  OCD  analysis,  we  really  want  to  test  if  two  score 
distributions — for  example,  G1  and  G7  scores — differ 

•  Consider  how  we’d  compare  two  distributions  that  we’ve 
turned  into  probabilities,  like:  G1  ->  (pj,  G7->  {<p} 

•  A  ‘distance’  measure  is  //2  =  1  —  2  ^/PlTi 

•  H2  is  called  the  ‘Squared  Hellinger  Distance’,  and  is  1  if  the 
(P;}>  {<?;)>  do  not  overlap,  and  in  [0,  1 )  if  they  do.  This  seems 
OK  as  a  ‘distance’,  but  there’s  a  problem:  In  the  table  below, 

Gx  vs  Gy  has  H2  =  0,  which  we’d  expect,  but  so  do  Gx  and  Gz 


Score 

i 

Gx 

Gy 

Gz 

0 

1 

8 

0 

0 

1 

2 

0 

0 

8 

2 

3 

2 

0 

0 

3 

4 

0 

1 

2 

4 

5 

0 

5 

0 

5 

6 

0 

4 

0 
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FITTING  BETA  DISTRIBTIONS 


■ 
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•  A  solution  to  this  problem  is  to  fit  continuous 
distributions  to  the  observed  discrete  probability  data. 

•  A  flexible  distribution  to  fit,  in  the  case  of  a  discrete 
distribution  lying  in  [0,  1],  Is  the  Beta(a,  (3)  distribution 

•  For  any  given  discrete  probability  distribution,  we  need 
an  estimate  of  a  and  |3.  These  are  given  by  the  ‘Method 
of  Moments’  formulas: 

oc  =  x  ( x('1_ x ^  —  1 ),  if  v  <  x(l  —  5c),  5c  is  a  mean 

V 

(3  —  (1  —  5c)  ( —  —  1 ),  if  v  <  5c(l  —  5c),  v  a  variance 


x  =  Y.npi  for  each  distribution,  v  =  ’Zpi(ri-  x)2 . 
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A  Common  Domain  is  Required 


■ 
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•  Hellinger’s  Distance  requires  that  the  distributions  we  are 
comparing  be  defined  over  a  common  domain 

•  Our  reference-derived  ridits  will  serve  as  the  domain  over 
[0,1]  but,  using  G1  as  the  ref.  basis,  pushes  us  to  the  left 

•  A  solution  is  to  use  the  row  sums  of  G1  ...G7  as  reference: 

•  This  spreads  the  domain  out  better,  and  gives  new  means 


Score 

G1 

G2 

G3 

G4 

G5 

G6 

G7 

r 

pl*r 

p2*r 

p3*r 

p4*r 

p5*r 

p6*r 

p7*r 

0 

0.091 

0.069 

0.000 

0.023 

0.000 

0.000 

0.000 

0.114 

1 

0.032 

0.096 

0.032 

0.161 

0.096 

0.000 

0.000 

0.321 

2 

0.053 

0.053 

0.106 

0.106 

0.317 

0.159 

0.053 

0.529 

3 

0.000 

0.000 

0.225 

0.075 

0.075 

0.300 

0.450 

0.750 

4 

0.000 

0.000 

0.180 

0.000 

0.000 

0.270 

0.090 

0.900 

5 

0.000 

0.000 

0.194 

0.000 

0.000 

0.000 

0.194 

0.971 

Mean  Ridits=SUM  0.176 

0.218 

0.737 

0.364 

0.489 

0.729 

0.787 
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Observed  Probability  ~’s  for  G1  ...G7 
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Fitted  Beta  ~’s  for  G 1 . . .  G7 
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Cumulated  Beta  vs  CumProbs  forG15 


0.0  0.2  0.4  0.6  0.8  1.0 


Cumulated  Beta  vs  CumProbs  for  G6 


Cumulated  Beta  vs  CumProbs  forG7 


0.0  0.2  0.4  0.6  0.8  1.0  0.0  0.2  0.4  0.6  0.8  1.0 


Cumulated  Beta  vs  CumProbs  for  G8 


i  l  i  i  r~ 

0.0  0.2  0.4  0.6  0.8 
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The  H2  Distances  Computed 


412TW 


•  Hellinger  distances  for  the  continuous  Beta  distributions 
can  now  be  computed  (B  is  the  beta-function)  as: 


•  Applying  the  above  formulas,  obtain  {a,  (3}  for  all  seven 
distributions  and  their  H2  distances  from  G1 : 


G1 

G2 

G3 

G4 

G5 

G6 

G7 

variance 

0.996 

0.200 

0.745 

0.376 

0.457 

0.701 

0.753 

mean(l-mean| 

0.124 

0.148 

0.059 

0.247 

0.250 

0.023 

0.000 

alpha 

-0.749 

-0.212 

-0.058 

-0.192 

-0.220 

-0.023 

0.000 

beta 

-0.127 

-0.047 

-0.863 

-0.152 

-0.234 

-0.943 

-1.000 

H2 

0.000 

0.016 

0.648 

0.164 

0.497 

0.773 

0.855 
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Use  H 2  to  test  distribution  differences 


•  Now  we  have  the  distance  from  G1  (as  a  reference)  to  all 
the  other  six  distributions,  we  can  use  the  H2’s  to  run 
randomization  tests  to  find  the  probability  that  a  null 
hypothesis  of  the  type  Ho:  G1=G2  is  false: 


1.  Given:  H2(Gi,  G2)  =  0.016 

2.  Take  a  random  sample  of  size  n=1 0  from  G1 ,  and  another 
from  G1  again.  Compute  H2  between  these  two  samples.  Do 
this  10,000  times 


3.  Compute  {number  of  times  H2  >  0.016}  / 10000.  This  is  the 
estimated  probability  P  that  H2  =0.016  will  occur  given  Ho  is  true 

4.  P  =  0.789,  so  Ho  is  likely  to  be  true.  All  prob.’s  shown  below: 


G1 

G2 

G3 

G4 

G5 

G6 

G7 

H2 

0.000 

0.016 

0.648 

0.164 

0.437 

0.773 

0.855 

Prob. 

1.000 

0.783 

0.000 

0.067 

0.000 

0.000 

0.000 
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^  j  Use  Beta  ~'s  for  confidence  intervals  on 


differences  between  means 
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1 .  Draw  two  samples,  each  size  n=1 0,  at  random  from  each 
of  two  Beta  distributions 


2.  Compute  the  difference  between  the  means. 

3.  Do  1  &  2  10,000  times 

4.  Compute  100(1-alpha/2k)  C.l.  quantiles  based  on 
proportions  where  k=number  of  comparisons  we  will  make 

(and  gives  the  Bonferroni  correction  for  a  95%  overall  confidence 
level;  k=6;  upper  /  lower  quantile  =  99.6%  /  0.42%). 

This  method,  comparing  Gl’s  ridit  mean  to  all  other  means,  gives 
C.I.’s:  So  G3,  (G5),  G6  &  G7  are  all  >  G1:  The  probability  that  their 
OCD  distributions  are  to  the  right  of  G1  is  confirmed 
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Summary  &  Conclusions  |p 

i  412TYI 

•  It  is  important  to  use  ridits,  or  some  other  nonparametric  method, 
when  comparing  different  flight-test  situations  with  ordinal  categorical 
data  (OCD)  ratings  such  as  Cooper-Harper,  or  the  Borg  scale 

•  RIDIT  ANALYSIS  is  recommended  as  a  simple  technique  to  replace 
the  incorrect  use  of  ordinal  categorical  data  as  ratio-scale  numbers. 

•  We  have  demonstrated  the  use  of  ridit  analysis  in  its  standard  form, 
and  examined  it  for  the  case  of  student  scores  and  Borg-scale  ratings 

•  We  have  shown  that  ridit  analysis  applies  to  these  cases,  and  that  a 
new  method  -using  fitted  Beta  distributions,  a  Distance-based 
method,  along  with  randomization  trials  produce  comparisons  of  mean 
values,  and  C.I.’s,  that  give  valid  probability  results. 
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